library(tidyverse)
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
[37m── [1mAttaching packages[22m ─────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──[39m
[37m[32m✓[37m [34mggplot2[37m 3.2.1 [32m✓[37m [34mpurrr [37m 0.3.3
[32m✓[37m [34mtibble [37m 2.1.3 [32m✓[37m [34mdplyr [37m 0.8.3
[32m✓[37m [34mtidyr [37m 1.0.0 [32m✓[37m [34mstringr[37m 1.4.0
[32m✓[37m [34mreadr [37m 1.3.1 [32m✓[37m [34mforcats[37m 0.4.0[39m
[37m── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31mx[37m [34mdplyr[37m::[32mfilter()[37m masks [34mstats[37m::filter()
[31mx[37m [34mdplyr[37m::[32mlag()[37m masks [34mstats[37m::lag()[39m
library(lubridate)
Attaching package: ‘lubridate’
The following object is masked from ‘package:base’:
date
Here is our first task:
The project goal is to identify patients seen for drug overdose, determine if they had an active opioid at the start of the encounter, and if they had any readmissions for drug overdose.
Your task is to assemble the study cohort by identifying encounters that meet the following criteria:
Sounds great. Let’s start by taking a look at the data.
allergies <- read_csv("datasets/allergies.csv")
Parsed with column specification:
cols(
START = [34mcol_date(format = "")[39m,
STOP = [34mcol_date(format = "")[39m,
PATIENT = [31mcol_character()[39m,
ENCOUNTER = [31mcol_character()[39m,
CODE = [32mcol_double()[39m,
DESCRIPTION = [31mcol_character()[39m
)
allergies
encounters <- read_csv("datasets/encounters.csv")
Parsed with column specification:
cols(
Id = [31mcol_character()[39m,
START = [34mcol_datetime(format = "")[39m,
STOP = [34mcol_datetime(format = "")[39m,
PATIENT = [31mcol_character()[39m,
PROVIDER = [31mcol_character()[39m,
ENCOUNTERCLASS = [31mcol_character()[39m,
CODE = [32mcol_double()[39m,
DESCRIPTION = [31mcol_character()[39m,
COST = [32mcol_double()[39m,
REASONCODE = [32mcol_double()[39m,
REASONDESCRIPTION = [31mcol_character()[39m
)
encounters
medications <- read_csv("datasets/medications.csv")
Parsed with column specification:
cols(
START = [34mcol_date(format = "")[39m,
STOP = [34mcol_date(format = "")[39m,
PATIENT = [31mcol_character()[39m,
ENCOUNTER = [31mcol_character()[39m,
CODE = [32mcol_double()[39m,
DESCRIPTION = [31mcol_character()[39m,
COST = [32mcol_double()[39m,
DISPENSES = [32mcol_double()[39m,
TOTALCOST = [32mcol_double()[39m,
REASONCODE = [32mcol_double()[39m,
REASONDESCRIPTION = [31mcol_character()[39m
)
medications
patients <- read_csv("datasets/patients.csv")
Parsed with column specification:
cols(
.default = col_character(),
BIRTHDATE = [34mcol_date(format = "")[39m,
DEATHDATE = [34mcol_date(format = "")[39m,
ZIP = [32mcol_double()[39m
)
See spec(...) for full column specifications.
patients
procedures <- read_csv("datasets/procedures.csv")
Parsed with column specification:
cols(
DATE = [34mcol_date(format = "")[39m,
PATIENT.x = [31mcol_character()[39m,
ENCOUNTER = [31mcol_character()[39m,
CODE.x = [31mcol_character()[39m,
DESCRIPTION.x = [31mcol_character()[39m,
COST.x = [32mcol_double()[39m,
REASONCODE.x = [32mcol_double()[39m,
REASONDESCRIPTION.x = [31mcol_character()[39m
)
procedures
Ok, we are chiefly interested in the encounters table, and basically want to filter it based on the specifications given in the task. Let’s start by filtering the encounters by drug overdose. Looking at the data dictionary sheet for the encounters table, we can see that the REASONCODE column are SNOMED-CT codes.
We can lookup the code for a drug overdose here: https://browser.ihtsdotools.org/, which has the code as 55680006.
drug_overdoses <- filter(encounters, REASONCODE == 55680006)
drug_overdoses
Great, now we just need to filter for encounters that occur after July 15, 1999.
The encounters table has two column that represent the date of the encounter. START and STOP, further clarification would be neccessary to determine if the task is to find encounters that begin after 07/15/1999 or end at that date. For the purposes of this exercise, we’ll go with encounters that begin after that date due to the term occur in the specification.
after_date <- filter(drug_overdoses, START > "1999-07-15")
arrange(after_date, START)
Now we’re concerned with encounters with patients between the ages of 18 and 35; we’ll need to join the patients table to handle that.
with_patients <- inner_join(after_date, patients, c("PATIENT" = "Id"))
with_patients
Based upon the wording in the specifications, the patient’s age must be greater than or equal to 18 at the start of an encounter and less than or equal to 35 at the end of the encounter.
Let’s make sure that there are no encounters in our table that has not ended, because a patient could age to 36 by the time the encounter is over.
not_ended <- drop_na(with_patients, STOP)
not_ended
Turns out we’re ok. Let’s do the filtering now. First we’ll need to calculate the age of the patient at the start and end of the encounter.
age <- mutate(not_ended, AGEATSTART = as.period(interval(BIRTHDATE, START))$year)
age <- mutate(age, AGEATSTOP = as.period(interval(BIRTHDATE, STOP))$year)
select(age, Id, AGEATSTART, AGEATSTOP)
aged <- filter(age, AGEATSTART >= 18 & AGEATSTOP <= 35)
aged
That finishes up the first task.
With your drug overdose encounter, create the following indicators:
Opioids List: * Hydromorphone 325Mg * Fentanyl – 100 MCG * Oxycodone-acetaminophen 100 Ml
Ok, looking at the data, it seems the only field we have to infer death on is in the patients table with the DEATHDATE column. If the date falls within the encounter dates, then we’ll mark it 1. The specifications don’t state what to do if the patient hasn’t died, this would need clarification, but for the purposes of this exercise we’ll leave it blank in those cases.
died <- mutate(aged, DEATH_AT_VISIT_IND = ifelse(DEATHDATE >= START & DEATHDATE <= STOP, 1, 0))
select(died, START, STOP, DEATHDATE, DEATH_AT_VISIT_IND)
For COUNT_CURRENT_MEDS we’ll have to used the medications table.
For CURRENT_OPOID_IND we’ll have to lookup the codes for the opoids in question; however, the codes in the medications table do not appear to match up with results found on: https://mor.nlm.nih.gov/RxNav/ (which is the RxNorm database the data dictionary mentioned). We would need clarification on this, but for this exercise we’ll search by the DESCRIPTION column instead.
Some drugs have multiple components/ingredients. It’s unsure whether we only should concern ourselves with the pure drugs of interest or also these. For example: Amlodipine 5 MG / Fentanyl 100 MCG / Olmesartan medoxomil 20 MG vs Fentanyl 100 MCG. We would need clarification, but for the exercise we’ll only examine the pure drugs because multiple ingredients can modulate the effects of the drug in question; and it’s a little closer to the specification.
opoids = c("Hydromorphone 325 MG", "Fentanyl 100 MCG", "Oxycodone-acetaminophen 100ML")
get_meds <- function(start, pt) {
filter(medications, PATIENT == pt & START <= start & STOP <= start)
}
current_meds <- died %>%
mutate(CURRENTMEDS = pmap(list(START, PATIENT), get_meds)) %>%
mutate(COUNT_CURRENT_MEDS = map_int(CURRENTMEDS, nrow)) %>%
mutate(CURRENT_OPOID_IND = map_int(CURRENTMEDS, function(med) any(med$DESCRIPTION %in% opoids)))
current_meds